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The Laser Interferometer Space Antenna (LISA) will operate as an AM/FM receiver for gravita- 
tional waves. For binary systems, the source location, orientation and orbital phase are encoded in 
the amplitude and frequency modulation. The same modulations spread a monochromatic signal 
over a range of frequencies, making it difficult to identify individual sources. We present a method 
for detecting and subtracting individual binary signals from a data stream with many overlapping 
signals. 



INTRODUCTION 



Estimates of the low frequency gravitational wave 
background below ~ 3 mHz [J Q have suggested that 
the profusion of binary stars in the galaxy will be a sig- 
nificant source of noise for space based gravitational wave 
observatories like LISA |3j . Most of these binary sources 
are expected to be monochromatic, evolving very little 
over the lifetime of the LISA mission; they will thus be 
ever present in the data stream, and data analysis tech- 
niques will need to be developed to deal with them. 

Below ~ 3 mHz, it is expected that there will be more 
than one binary contributing to the gravitational wave 
background in a given frequency resolution bin. Predic- 
tions suggest that the population of binaries will be so 
large as to produce a confusion limited background which 
will effectively limit the performance of the instrument. 
In this regime, it is likely that time delay interferome- 
try techniques can be employed to characterize the back- 
ground [4|. At higher frequencies, open bins appear and 
individual galactic binaries (in principle) become resolv- 
able as single monochromatic lines in the Fourier record 
(a "binary forest"). Complications arise, however, from 
the orbital motion of the LISA detector, which will mod- 
ulate the signal from an individual source, spreading the 
signal over many frequency bins. A quick method for 
demodulating the effect of the orbital motion on contin- 
uous gravitational wave sources has recently been demon- 
strated [U- 

Unlike sources for ground based observatories, the 
gravitational waves from low frequency galactic bina- 
ries are expected to be well understood. In principle, 
it should be possible to use knowledge about the ex- 
pected gravitational wave signals to "subtract" individ- 
ual sources out of the LISA data stream, both at high 
frequencies where individual sources are resolvable and at 
lower frequencies where single bright sources will stand 
out above the rms level of the confusion background. The 
ability to perform binary subtraction in LISA data anal- 
ysis is particularly important in the regime of the LISA 
floor (from ~ 3 mHz to the LISA transfer frequency, 
/* = c/(2nL) 10 mHz), where an overlapping popula- 
tion of galactic binaries will severely limit our ability to 



detect and study gravitational waves from other sources, 
such as the extreme mass ratio inspiral of compact ob- 
jects into supermassive black holes 

As will be seen, the problem of subtracting a binary 
out of the data stream is intimately tied to the problem of 
source identification, which is complicated by the motion 
of the LISA detector. Several authors 0, |j| have previ- 
ously examined the angular resolution of the observatory 
as a function of the time dependent orientation. The bi- 
nary subtraction problem has received some attention in 
the t>ast|lO|. but the work was not published. 

This paper examines the problem of binary subtrac- 
tion using a variant of the CLEAN algorithm ^lj from 
electromagnetic astronomy as a model for the subtrac- 
tion procedure. The CLEAN algorithm may be concisely 
described in a few steps: 

• Identify the brightest source in the data. 

• Using a model of the instrument's response func- 
tion, subtract a small portion of signal out of the 
data, centered on the bright source. 

• Remember how much was subtracted and where. 

• Iterate the first three steps until some prescribed 
level in the data is reached. 

• From the stored record of subtractions, rebuild in- 
dividual sources poj. 

The implementation of the CLEAN algorithm in this pa- 
per is built around a search through a multidimensional 
template space which covers a binary source's frequency 
and amplitude, sky position, inclination, polarization and 
orbital phase. 

The format of this paper will be as follows: Section 
Hll describes the modulation of gravitational wave sig- 
nals by the motion of the LISA detector with respect 
to the sky. Sections II I II and IIVI outline the description 
of the binaries used in this work, and the effect of the 
detector motion on their signals. Section [V] describes 
the template space used to implement the gravitational 
wave CLEAN ("gCLEAN") algorithm. Section |Vl| re- 
views the expected contributions of instrumental noise 



2 



and the effects on the data analysis procedure. Section 
IV 111 describes and demonstrates the gCLEAN procedure 
in detail. Lastly, a discussion of outstanding problems 
and future work is given in Section IVIIII 



II. SIGNAL MODULATION 

LISA's orbital motion around the Sun introduces am- 
plitude, frequency and phase modulation into the ob- 
served gravitational wave signal. The amplitude mod- 
ulation results from the detector's antenna pattern be- 
ing swept across the sky, the frequency modulation is 
due to the Doppler shift from the relative motion of the 
detector and source, and the phase modulation results 
from the detector's varying response to the two gravi- 
tational wave polarizations. The general expression de- 
scribing the strain measured by the LISA detector is quite 
complicated 12j , but we need only consider low frequency, 
monochromatic plane waves. Here low frequency is de- 
fined relative to the transfer frequency[l3j of the LISA 
detector, /* 10 mHz. The low frequency LIS A re- 
sponse function was first derived by Cutler |8|, but we 
shall use the simpler description given in Ref. 12]. 

A monochromatic plane wave propagating in the D, di- 
rection can be decomposed: 

h(t, f) = A + cos(27r/t + tp Q )e + + A x sin(27r ft + (p Q )e x , 

where A + and A x are the amplitudes of the two polar- 
ization states and 



e = p®p-q<&q, 
e x = p® q + q®p, 



(2) 



are polarization tensors. Here p and q are vectors that 
point along the principal axes of the gravitational wave. 
For a source located in the n = — 57 direction described 
by the ecliptic coordinates (0, 0) we can construct the 
orthogonal triad 

u = cos 9 cos x + cos 8 sin y — sin 9 z 
v = sin £ — cos y 

h = sin cos x + sin 8 sin 4>y + cos 8 z . (3) 

This allows us to write 

e + = cos 2 ip e + — sin 2ip e x , 
e x = sin2V>e + + cos2?/>e x , 

where 



(4) 



e 

e > 



u ® u — V ® V, 
u® v + V ® u, 



and the polarization angle ip is defined by 

v ■ p 



tamp 



u ■ p 



(5) 



(6) 



The strain produced in the detector is given by 

s(t) = A+F + {t) cos$(i) + A X F X (t) sin$(t) , (7) 
where 

^{t) = 2tt ft + Vo + 4> D {t). (8) 

Here <pr>(t) describes the Doppler modulation and F + (t), 
F x (t) are the detector beam patterns 



F+(t) 



1 r 

-[cos 
1 



2ijjD + {t) - sin2i/>L> x (t)] 



F x (t) = - [sm2iJ>D + (t) + cos2iPD x (t)] , (9) 



where 



AT 

D + (t) = -j— - 36 sin 2 (9 sin(2a(f) - 2A) 

+ (3 + cos 28) ( cos 2^(9 sin 2A - sin(4a(i) - 2A)) 

+ sin20(cos(4a(f) - 2A) - 9cos2A) 
-4V3sin26>(sin(3a(f) - 2 A - 0) 

-3sin(a(t) - 2A + 0)) 



(10) 



and 



D x (t) = -4[\/3cos6»f9cos(2A- 20) 
16 L V 

- cos(4a(i) - 2A - 20) J 
- 6sin6»(cos(3a(i) - 2A - 0) 
+3cos(a(f) - 2A + 0)) 



(11) 



The quantity a(t) = 2Trf m t+K describes the orbital phase 
of the LISA constellation, which orbits the Sun with fre- 
quency f m = year -1 . The constants k and A specify the 
initial orbital phase and orientation of the detector [l^. 
We set K = and A = 37r/4 in order to reproduce the 
initial conditions chosen by Cutler |Sj. The Doppler mod- 
ulation depends on the source location and frequency, 
and on the velocity of the guiding center of the detector: 

R 

(j) D {t) = 2nf— sin 6» cos(27r/ m t - 0) (12) 
c 

Here R is the separation of the detector from the barycen- 
ter, so R/c is the light travel time from the guiding center 
of the detector to the barycenter. 

The expression for the strain in the detector can be 
re-arranged using double angle identities to read: 



where 



s(t) =A(t)cos*(i) (13) 



4- (t) = 2tt/< + ip + 4> D (i) + <p P (t) . (14) 
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The amplitude modulation A(t) and phase modulation 
4>p{t) are given by 



A(t) = [(A + F+(t)) 2 + (A x F*(t)) 



21 1 / 2 



>(t) 



-arctan 



AyF*(t) 

A+F+(t) 



(15) 



(16) 



Each of the modulation functions are periodic in harmon- 
ics of f m . To get a feel for how each modulation affects 
the signal, we begin by turning off all but one modulation 
at a time and look at how each individual term affects 
the signal. 



A. Amplitude modulation 

Amplitude modulation derives from the sweep of the 
detector's antenna pattern across the sky due to the ob- 
servatory's orbital motion, which for LISA gives a mod- 
ulation frequency, f rn — 1/year. Pure amplitude modu- 
lation takes the form 



s(t) = A(t) cos(2irft + tp ). 



(17) 



The amplitude, A(t) is modulated by the orbital motion, 
and may be expanded in a Fourier series: 

oo 

A(t) = a n e 2 ™f™ nt (18) 

n— — oo 

which allows the signal in Eq. H17fl to be written 

«(<) = »( jr a n e 2 ^ f+Mt e^°\ . (19) 



\n--oo 



Thus, the Fourier power spectrum of s(t) will have side- 
bands about the carrier frequency / of the signal, spaced 
by the modulating frequency f m . The bandwidth, B, of 
the signal is defined to be the frequency interval which 
contains 98% of the total power: 

B = 2Nf m , (20) 

where N is determined empirically by 



N 



Y KI 2 >o.98 Y k 



--N 



n— — oo 



Typical LISA sources give rise to an amplitude modu- 
lation with N — 2 and using Eq. (|20|l a bandwidth of 
B = 4/ m = 1.3 x 1CT 4 mHz. 



B. Frequency modulation 

Doppler (frequency) modulation of signals occurs be- 
cause of relative motion between the detector and the 



source, and depends on the angle between the wave prop- 
agation direction f2 and the velocity vector of the guiding 
center. Pure Doppler modulation takes the form 

s(t) = A cos [2nft + (3 cos{2ir f m t + 5) + tp ] , (22) 

where (3 and <5 are constants. Using the Jacobi-Anger 
identity to write the Fourier expansion as 



i/3 sin(27r/, 



27Tif m nt 



(23) 



allows the signal in Eq. (1221 to be written 



s(t) = $t I A J n (P)e 2m< - f+Mt e l ^e m{s+7r/2) . 

\ 71 — — CO / 

(24) 

Once again, the Fourier power spectrum of s(t) will have 
sidebands about the carrier frequency /, spaced by the 
modulating frequency f m . The bandwidth of the signal 
is given by 



B = 2(l +/3)f„ 



(25) 



For LISA, the parameter fj (called the modulation index), 
which encodes the description of the detector motion rel- 
ative to the source, is given by 



/3 = 2tt/- sin 



(26) 



Sources in the equatorial plane have bandwidths ranging 
fromB = 2.6 xlCT 4 mHz at / = 1 mHz to B = 2.1 xl(T 3 
mHz at / = 10 mHz. 



C. Phase modulation 

Phase modulation is a consequence of the fact that 
the detector has different sensitivities to the two gravita- 
tional wave polarization states, + and x, characterized 
by the two detector beam patterns, F + (t) and F x (t). 
The variation of these beam patterns is a function of the 
detector motion (see Eq. ©), and modulates the phase. 
Phase modulation takes a similar form to the frequency 
modulation. Expanding <frp(t) in a Fourier sine series 
yields a signal 



(21) s (t)=A cos(2tt ft + <ys + Pn sin(2?r f m nt + 5 n )). (27) 



Again, the the Fourier power spectrum of s(t) has side- 
bands about the carrier frequency /, spaced by the 
modulating frequency f rn . The main difference is that 
the Fourier amplitude of the k th sideband (located at 
/ + kf m ) is given by 



Cfe 



A WY. JiAMe u " S -e^ where fc = ^^ 



n i„ 



(28) 
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Since the (3 n 's for LISA are independent of frequency (at 
least in the low frequency approximation used here), the 
bandpass of the phase modulated signal is independent of 
the carrier frequency. Empirically wc find the bandwidth 
B w 1CT 4 mHz. 



D. Total modulation 

It is possible to combine the amplitude, frequency and 
phase modulations together to arrive at an analytic ex- 
pression for the full signal modulation. The carrier fre- 
quency / develops sidebands spaced by the modulation 
frequency f m . The total modulation is most easily com- 
puted beginning from Q. One can write 



E Pn* 



2-KlfrnTLt 



n— — 4 



E 

n— — 4 



27rif m nt 



(29) 



where the small number of non-zero Fourier coefficients 
can be attributed to the quadrupole approximation for 
the beam pattern. The coefficients p n and c n can be 
read off from © in terms of (9, <p) and i/j. 

Our next task is to Fourier expand cos$(i) and 
sin <!>(£), being careful to take into account the fact that 
we are doing finite time Fourier transforms, so / will not 
be an integer multiple of f m . In other words, writing 



JY 



2mft 



= ^2 ° n< 

n=~N 



2-Kif m nt 



we find in the limit N 3> 1 that 

a n ~ sinc(7ra; n )e l7r:r ™ where x r , 



fn 



(30) 



(31) 



The coefficients are highly peaked about n = int(/// m ), 
where the function "int" returns the nearest integer to 
its argument. The maximum bandwidth occurs when 
the remainder / / f m — n equals 1/2; the max bandwidth 
is equal to 20/ m for 98% power (36/ m for 99% power). 
Putting everything together we find 



F+cos$(i) = 3? 



J k {P)e 2wifmkt e ik 



(tt/2-0) 



xe 



i -Pa 



L k 



2mf m lt 



2irif rn nt 



(32) 



and 



F x sin$(*) = 9 



2irif m kt ik(n/2- 



xe" 



E c;f 



L k 



2mf m lt 



L l 



E a «* 



,'2,'KiftnTlt 



(33) 



It follows that the Fourier expansion of s(t) is described 
by the triple sum 



n i' : '> 



k 



MP) 



(34) 

where q = k + l + n. The limited bandwidth of the various 
modulations allows us to restrict the sums: — (1 + f3) < 
k < (1 + P), -4 < I < 4 and -10 < n - int(/// m ) < 10. 
Using 13411 we can compute the discrete Fourier transform 
of s(t) very efficiently. 

The source identification and subtraction scheme used 
in this work depends on the development and use of a 
template bank covering a large parameter space. As 
such, issues related to efficient computing are of interest 
in order to make the problem tractable in a reasonable 
amount of time. A number of simplifying factors allow 
the problem to be compactified significantly, with great 
savings in computational efficiency. 

The quantities p n and c„ only depend on 9, <fi, and ip, 
so they can be pre-computed and stored. The complete 
template bank can then be built using l|34|) by stepping 
through a grid in /, (po and the ratio A x /A + . The com- 
putational saving as compared to directly generating s(t) 
for each of the six search parameters is a factor of ~ 10 5 
in computer time. 

Another big saving in computer time is based on the 
following observation: The Fourier expansions of sources 
a and b with all parameters equal save their frequencies, 
which differ by an integer multiple, m, of the modulation 
frequency f m , are related: 



c 



we have 



(35) 



This 



Thus, so long as to < 10 4 

allows us to use a set of templates generated at a fre- 
quency / to cover frequencies between / ± 10 4 / m . These 
savings mean that our Fourier space approach to calcu- 
lating the template bank are a factor of 10 9 times faster 
than a direct computation in the time domain! 



III. BINARY SOURCES 

With the exception of systems that involve super- 
massive black holes, all of the binary systems that can 
be detected by LISA are well described by the post- 
Newtonian approximation to general relativity. Most 
of these sources can be adequately described as circu- 
lar Newtonian binaries, and the gravitational waves they 
produce can be calculated using the quadrupole approx- 
imation. In terms of these approximations, a circular 
Newtonian binary produces waves propagating in the f2 
direction with amplitudes 



A+ = A[l + (L-n) 2 
A x = 2AL-Q 



(36) 
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where 



IV. BINARY SIGNAL MODULATION 



A 



2M 1 M 2 
rd 



(37) 



Here r is the distance between mass Mi and M2, d is the 
distance between the source and the observer, and L is 
a unit vector parallel to the binary's angular momentum 
vector. The gravitational waves have frequency 



/ — 2 / orb 



1 Mx + M-2 



(38) 



The generalization to elliptical Newtonian binaries is 
given in Peters and Mathews |l4j. They found that ellip- 
tical binaries produce gravitational waves at harmonics of 
the orbital frequency / or b- For small eccentricities, most 
of the power is radiated into the second harmonic, with 
the portion of the power radiated into higher harmon- 
ics increasing with increasing eccentricity. From a data 
analysis perspective, an eccentric binary looks like a col- 
lection of circular binaries located at the same position 
on the sky, with frequencies separated by multiples of 
/orb- One strategy to search for eccentric binaries would 
be to conduct a search for individual circular binaries, 
then check to see if binaries at a certain location form 
part of a harmonic series. If they do, the relative am- 
plitude of the harmonics can be used to determine the 
eccentricity. 

The polarization angle of a circular binary is related to 
its ang ular momentum vector orientation, L 

by" 



The effects of amplitude, frequency and phase modula- 
tion on two binary sources with barycenter frequencies of 
10 and 1 mHz are shown in Figures ^ and [5] respectively. 
The sources have all parameters equal save their frequen- 
cies, and are located close to the galactic center. We see 
that frequency modulation dominates at 10 mHz, while 
frequency and phase modulation become comparable at 
1 mHz. 
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FIG. 1: Power spectra showing the effects of frequency (FM), 
phase (PM) and amplitude (AM) modulation separately and 
all together (TM). The gravitational wave has frequency 10 
mHz. 



tan?/; 



t> jj ) sin ol — cos u l sin ( 



sin#L sm((j> - 4>l) 
The inclination of a circular binary, 1 is given by 



, (39) 



COS 2 



-L ■ 

cos 6l cos ( 



■ sin0isin0cos(</>L - </>) (40) 



It follows that the amplitude and phase modulation de- 
pend on four parameters. Two are the sky position (9, (j>) 
and the other two are either the the angular momen- 
tum direction (6l,4>l), or the polarization angle tp and 
the inclination 1. We found (1, ip) to be easier to work 
with as the quadrupole degeneracy between sources with 
parameters (ip,(fo) and (tp + n/2,ipo + n) is explicit in 
these coordinates. The total gravitational wave signal 
from a Newtonian binary depends on seven parameters: 
A — > (9, (j), 1, t/j, tpo, /, .4). The parameter space has 
topology S 2 x T 3 x R 2 . The parameters 9 and <j> range 
over their usual intervals: 9 £ [0,7r] and (j> S [0, 2n]. The 
inclination and polarization have ranges: 1 £ [0, 7r] and 
ip £ [0,7r]. Because of the quadrupole degeneracy dis- 
cussed above, we restrict the range of the orbital phase: 
(po G [0,7r). 
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FIG. 2: Power spectra showing the effects of frequency (FM), 
phase (PM) and amplitude (AM) modulation separately and 
all together (TM). The gravitational wave has frequency 1 
mHz. 



One of the main effects of the modulations is to spread 
the power across a bandwidth B ~ 2(1 + 2?r/f sin0)/ m - 
This, combined with LISA's antenna pattern, means that 
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TABLE I: Properties of the six nearest interacting white dwarf 
binaries. Physical data from Hellier [l5||. periods taken from 
NSSDC catalog 5509 16]. Spectral amplitudes are computed 
using the methods of this paper for one year of observations. 
The masses quoted in units of the solar mass Mq, the orbital 
periods are in seconds, the distances are in parsecs and the 
strain spectral densities are in units of 10 -19 Hz -1 / 2 . 



Name 


mi 


ni2 


Porb 


d 


hf 


hf 


hf/hf 


AM CVn 


0.5 


0.033 


1028.76 


100 


21.2 


2.34 


0.111 


CP Eri 


0.6 


0.02 


1723.68 


200 


5.19 


1.06 


0.205 


CR Boo 


0.6 


0.02 


1471.31 


100 


11.5 


1.63 


0.141 


GP Com 


0.5 


0.02 


2791.58 


200 


3.32 


0.44 


0.133 


HP Lib 


0.6 


0.03 


1118.88 


100 


20.7 


4.53 


0.219 


V803 Cen 


0.6 


0.02 


1611.36 


100 


10.9 


1.89 


0.174 



the strain in the detector is often considerably less than 
the strain of the wave. The effect can be quantified in 
terms of the amplitude of the detector response, A, and 
the intrinsic amplitude of the source A. Suppose that 
a source is responsible for a strain in the detector s(t). 
Defining A as the orbit-averaged response: 

A 2 = ^j\ 2 (t)dt, (41) 

we find from that 

A 2 = X -A 2 ((1 + cos 2 i) 2 {Fl) + 4 cos 2 i(F 2 )) , (42) 

where the orbit-averaged detector responses are given by 

(F 2 ) = - (cos 2 2^(D 2 + ) - sin 4^ (D+D x ) + sin 2 2^(£> 2 )) 

(F 2 ) = - (cos 2 2^(D 2 ) + sin ^(D+D x ) + sin 2 2^(L>+)) 

(F+F x ) = i ( S m4^((D 2 + ) - (Dl)) + 2cos4V(D + D x )) 

(43) 

and 

243 

(D + D x ) = cos 6 sin 20(2 cos 2 - 1)(1 + cos 2 6) 

512 

(D 2 X ) = — (120 sin 2 + cos 2 9 + 162 sin 2 20 cos 2 6) 

(Dl) = — — (487 + 158 cos 2 6> + 7 cos 4 (9 
x +/ 2048 v 

-162 sin 2 20(1 + cos 2 0) 2 ) . (44) 

The relative amplitude A/ A depends on the source decli- 
nation 8, right ascension 0, inclination i and polarization 
angle ip- 

Table 1 illustrates the power spreading effect for the 
six nearest interacting white dwarf binaries. Random 
numbers were used for the unknown parameters i and ip. 



The average strain spectral density in the detector, hf , 
is between 5 and ten times below the strain spectral den- 
sity at the barycenter hj. The effect is more significant 
at higher frequencies since the bandwidth increases with 
frequency. 

The signals from three of these binaries, averaged over 
their bandwidths, are plotted against the standard LISA 
noise curve in Figure |3J The complete signals for all 
six binaries are shown in Figure 0] The strain spectra 
appear as nearly vertical lines of dots due to the highly 
compressed frequency scale. 



-15 




-21 I I I I I 

-5 -4 -3 -2 -1 

logf 

FIG. 3: The strain spectral densities of three nearby interact- 
ing white dwarf binaries plotted against the standard LISA 
noise curve. 
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-3.25 -3 -2.75 -2.5 

logf 

FIG. 4: The modulated strain spectral densities of the six 
nearest interacting white dwarf binaries plotted against the 
standard LISA noise curve. Crosses mark standard estimates 
for the known binaries, while alternate symbols mark modu- 
lated Fourier components. 
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V. TEMPLATE OVERLAP 

A. Template metric 

The templates are constructed by choosing the six pa- 
rameters A — > (/, 9, 4>, i, tp, tpo) an< i forming the noise-free 
detector response corresponding to a source with those 
parameters: 



A) = A(t, A) cos*(t, A) 



(45) 



We need to determine how closely the templates need to 
be spaced to give a desired level of overlap. The overlap 
of templates with parameters Ai and A 2 is defined: 



i?(Ai,A 2 ) 



Kt,Ai)|s(i,A 2 )) 



s(t,X 1 )\s(t,X 1 ))VZ(s(t,X 2 )\s(t,X 2 ))i/ 2 



with the inner product 
(a(t)\b(t)) 



a(t)b(t)dt. 



(46) 



(47) 



Suppose we have two templates, one with parameters A 
and the other with parameters A + <5A. To leading order 
in SX the overlap is given by 



R(X, A + SX) = 1 - g ij (X)AX i AX j 
where gij is the template space metric 

(d iS (t, X)\d jS (t,X)) 



(48) 



9ij(X) = 



2(s(t,X)\s(t,X)) 

(s(t,X)\d lS (t,X))(s(t,X)\d jS (t,X)) 
2{s(t,X)\s(t,X)) 2 



(49) 



Using the fact that ^>(t, A) varies much faster than A(t, A) 
we find 



9ijW 



(diA\djA) + (Ad^\Ad^) 
2{A\A) 
(AldjA^AldjA) 
2(A\A) 2 



(50) 



Ignoring the sub-dominant amplitude and phase mod- 
ulation allows us to analytically compute the "Doppler 
metric" 

ds 2 = '/,, ; AjAA'AA*' 

2tt 2 2 2 1 , 

= — T df + nTdfdipo + -dip 



-2-Kf—Tdfi cos6 sin 
c 

-i?f (^V(cos 2 9d6 2 



sin f cos 



Here T = 1 year is the observation time. We have to go 
beyond the Doppler approximation to find metric com- 
ponents that involve i and %p. The computations are con- 
siderably more involved, as are the resulting expression. 
For example, including all modulations we find 



and 



2(F 2 ) (F 2 ) sin 2 i (sin 2 i + 2 cos 4 1) 
((1 + cos 2 i) 2 (F 2 ) +4cos 2 j(F 2 )) 2 



2(1 + cos 2 i) 2 (F 2 )+ 4 cos 2 t(F 2 ) 
(1 +cos 2 i) 2 {F 2 ) +4cos 2 «(F 2 > 
2sin 4 i(F+F x ) 



(52) 



((1 + cos 2 i) 2 {Fl) +4cos 2 z 



(53) 



We have been able to derived exact expressions for all the 
metric components, but they are cumbersome and not 
very informative. For most purposes the simple Doppler 
metric is sufficient. 



B. Overlap of parameters 

An important application of the Doppler metric in Eq. 
(|51|l is the determination of parameter overlap, which 
has great bearing on the placement of templates. In re- 
gions where large variations of the overlap can be seen for 
small changes in parameters, templates must be spaced 
closely to distinguish between different realizable physi- 
cal situations. In regions where the change in overlap is 
small for small changes in parameters, the templates can 
be spaced more widely. The Doppler metric depends on 
only four parameters. Taking constant slices through the 
parameter space for any two of the four will produce a 
metric which can be used to plot level curves of the over- 
lap function R{X\ 1 A 2 ) as a function of two parameters. 
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sin 9d<j) 2 ) . (51) FIG. 5: The template overlap contours on the (/, <^o) cylinder. 
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Setting d9 



leaves the two dimensional metric 



on the (/, (po) cylinder: 



j 2 2 / 7T 3 , 



7^0- 



(54) 



Using this metric to plot the level sets of the overlap 
function, the contours for 90%, 80% and 70% overlap 
are shown in Figure |3J One rather surprising result is 
that the template overlap drops very quickly with Af. 
According to Nyquist's theorem, the frequency resolution 
observations of time T should equal Jn = 1/T. But we 
see from Figure|S]that the overlap drops to 90% for Af ~ 
/at/10. (Since we are using T = 1 year, it so happens 
that the Nyquist frequency /at equals the modulation 
frequency f m .) 
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FIG. 7: The template overlap contours on the sky 2-sphere in 
the neighborhood of 9 = 7r/4. 
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FIG. 8: The overlap of templates with all parameters equal 
save sky position. The reference template has a frequency of 
/ = 10 mHz and a sky location of (8, <f)) = (tt/4, 0). 



FIG. 6: The overlap of templates with all parameters equal 
save frequency and orbital phase. The reference template has 
(/, ipo) = (0.01, 7r). The frequency resolution was found to be 
independent of the reference frequency. 

Since the metric (|51|) was derived by neglecting ampli- 
tude and phase modulation, it only gives an approximate 
determination of the template overlap. Moreover, the 
approximate metric neglects the (i, ip) dependence com- 
pletely. In order to have a more reliable determination 
of the template overlap we generated a large template 
bank and studied the template overlap directly. We see 
that the template overlap shown in Figure[S]for / vs. ipo 
agrees with what was found in Figure 

Setting df = difo = gives us the metric on the sky 
2-sphere: 



R 



cos 



sin 







(55) 



It is clear from this form of the metric that the angular 
resolution improves at higher frequencies, and that the 
metric is not that of a round 2-sphere. The cos 2 9 factor 
in front of d6 tells us that the 9 resolution drops as we 
near the equator. This is might seem counter-intuitive 
since the Doppler modulation is maximal at the equator 
(it depends on sin#). But, the 9 resolution depends on 
the rate of change of the Doppler modulation with 9, 
which goes as cos 6*. Setting / = 10 mHz, we plot the 
template overlap contours in the neighborhood of 9 = 
tt/4 and 9 = ir/2 in Figures and [5] respectively. 

The template overlaps shown in Figures [S] and ^] (6 
vs. (f>) agree with those in found in Figures and 
The overlaps shown in Figures ITT1 and IT2"1 falso 9 vs. cp) 
confirm our expectation that the angular resolution de- 
creases with decreasing frequency. 

The template overlap as a function of inclination and 
polarization angle turns out to be a very sensitive func- 
tion of location in parameter space. While independent 
of frequency and orbital phase, the metric functions g m 




FIG. 9: The template overlap contours on the sky 2-sphere in 
the neighborhood of 9 = n/2. 
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FIG. 10: The overlap of templates with all parameters equal 
save sky position. The reference template has a frequency 
of / = 10 mHz and is close to the galactic center, (9, cj>) = 
(1.66742,4.65723). 



and Qiji^ range between and 326 as (9, <j>, i, i/j) are varied. 
Taking a uniform sample of 1.6 x 10 9 points in (9, 0, i, ip), 
we found that g tl had a mean value of 0.4664, a median 
value of 0.043, and that 90% of all points had g n < 1.273. 
Similarly, g^ had a mean value of 2.101, a median value 
of 2.0, and that 90% of all points had g^ < 2.251. The 
analytic expressions for g^, g tl and were found to 
agree with direct numerical calculations of the template 
overlap. 



C. Degeneracies 

What the metric can not tell us about are the non- 
local degeneracies that occur in parameter space. A mild 
example of a non-local degeneracy can be seen in Figure 
[5J where there are secondary maxima in the template 
overlap in the southern hemisphere. Physically this oc- 
curs because the dominant Doppler modulation is un- 



FIG. 11: The overlap of templates with all parameters equal 
save sky position. The reference template has a frequency 
of / = 1 mHz and is close to the galactic center, (9, (f>) = 
(1.66742,4.65723). 




FIG. 12: The overlap of templates with all parameters equal 
save sky position. The reference template has a frequency of 
/ = 1 mHz and a sky location of (9, (f>) = (7r/4, 0). 



able to distinguish between sources above and below the 
equator. This strong degeneracy is ameliorated by the 
amplitude and phase modulations, which are sensitive 
to which hemisphere the source is located in. In the 
course of applying the gCLEAN procedure we discovered 
several other much stronger non-local parameter degen- 
eracies. By far the worst were those that involved fre- 
quency and sky location. The secondary maxima some- 
times had overlaps as high as 90%. An example of a non- 
local parameter degeneracy in the f - 9 plane is shown 
in Figure H~3l The reference template has / = 4.999873 
mHz, 9 = 0.5690, = 0.643, i = 1.57, ij> = 0.314, and 
(po = 0.50. The strongest of the secondary maxima is 
located at / = 4.999910 mHz, 9 = 0.4615, and has an 
overlap of 90% with the reference template. In principle, 
a sufficiently fine template grid should always find the 
global maxima, but in practice, detector noise and in- 
terference from other sources can cause gCLEAN to use 
templates from secondary maxima. We will return to 
this issue when discussing the source identification and 
reconstruction procedure. 
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VI. INSTRUMENT NOISE 
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FIG. 13: An example of non-local parameter degeneracies in 
the /, 9 plane. 



In order to construct a demonstration of the gCLEAN 
method, it is necessary not only to characterize the bi- 
nary signals themselves, but also the noise in the detec- 
tor. Instrumental noise can have important consequences 
for the gCLEAN process, particularly in low signal to 
noise ratio situations, where random features in the noise 
spectrum of the instrument could conspire to approxi- 
mate the modulated signal from a binary. 

The total output of the interferometer is given by the 
sum of the signal and the noise: 



h(t) = s(t) + n(t) 



(56) 



Assuming the noise is Gaussian, it can be fully charac- 
terized by the expectation values 

(n(/)> = 0, and (h*(f)n(f')) = ±6(f-f')S n (f), 

(57) 

where S n (f) is the one-sided noise power spectral density. 
It is defined by 



(n 2 (*)> 



dfS n (f) , 



(58) 



D. Counting templates 

Deciding what level of template spacing is acceptable 
depends on two factors: the signal-to- noise level and com- 
puting resources. Given a signal-to-noise level of SNR, 
there is no point having the template overlap exceeding 
~ (1 — 1/SNR) x 100%. For the searches described in the 
next section we made a trade-off between coverage and 
speed, and chose template spacings that gave a minimum 
template overlap of ~ 75% in each parameter direction 
(i.e. with all parameters equal save the one that is var- 
ied). We chose to study sources with frequencies near 
5 mHz, and used a uniform template grid with spacings 
A/ = / m /5, Aip = tt/4, A9 = = 3.7°, Ai = tt/7 
and Aip = tt/9. A better approach would be to vary 
the template spacing according to where the templates 
lie in parameter space. As explained earlier, each set of 
5 x 4 x 3072 x 7 x 9 = 3, 870, 720 templates can be used to 
cover a frequency range of 10 4 / m . At worst, a source may 
lie half way between two templates, soa~ 75% template 
overlap translates into a ~ 92% source overlap. After the 
coarse template bank has been used to find a best match 
with the data, we refine the search in the neighborhood 
of the best match using templates that are spaced twice 
as finely in each parameter direction. 

To implement the gCLEAN procedure, a template 
bank was constructed by gridding the sky using the 
HEALPIX hierarchical, equal area pixelization scheme 
p| . The HEALPIX centers provide sky locations (6,<p) 
to build up families of templates distributed across the 
parameters (/, I, ip, tpo). 



where the angle brackets denote an ensemble average. 
The one-sided power spectral density is related to the 
strain spectral density by S n (f) = \h n (f )\ 2 . 

Expressing the noise as a discrete Fourier transform: 



(59) 



a realization of the noise can be made by drawing the real 
and imaginary parts of rij from a Gaussian distribution 
with zero mean and standard deviation 



hn(fmj) 

V2 



(60) 



The signal-to-noise ratio in a gravitational wave detec- 
tor is traditionally defined as 



SNR(/) 



Sn(f) 



(61) 



where S s (f) is the one-sided power spectral density of the 
instrumental signal pi). Given a particular set of sources, 
each with their own modulation pattern, and a particular 
realization of the noise, the quantity SNR(/) will vary 
wildly from bin to bin. A more useful quantity is obtained 
by comparing the signal to noise over some frequency 
interval of width A/ centered at /: 



SNR(/,A/) = 



' {&(/)} 

{Sn(f)} 



(62) 
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where 

/•/+A//2 

{S(f)}= / S(x)dx. (63) 

J/-A//2 

A good choice is to set A/ equal to the typical bandwidth 
of a source. 



VII. SOURCE IDENTIFICATION AND 
SUBTRACTION 

The procedure for subtraction is intimately tied to 
the task of source identification, as sources with overlap- 
ping bandwidths interfere with each other. Overlapping 
sources have to be identified and removed in a simultane- 
ous, iterative procedure called the gCLEAN algorithm. 

The task of gCLEAN can be understood by thinking 
of the LISA data stream as an N dimensional vector S 
which represents the sum of all the sources the algorithm 
is seeking to subtract. S is called the total source vector. 
The ideal output of the gCLEAN algorithm is a set of 
basis vectors and their amplitudes (i.e., sources Sj) which 
contribute to S: 

S=si+s 2 + ... . (64) 

The basis vectors which contribute to individual 
sources Sj are the unit-normed templates on the parame- 
ter space, tj, built from Eq. (|34|l . In principle, the vector 
space of templates will be quite large, where the number 
of basis vectors M is much greater than the dimension- 
ality N of the source vector. 

A typical application may attempt to CLEAN a fre- 
quency window of width A/. The source vector S has 
dimension N = 2A///jy, where the factor of two ac- 
counts for the real and imaginary parts of the Fourier 
signal. We typically considered frequency windows of 
size A/ « 1 /iHz and observation times of T = 1 year, so 
N s» 60. By contrast, the number of templates used in a 
search over that data stream is of order 10 8 . This discrep- 
ancy in size naturally leads to the possibility of multiple 
solutions, implying that the problem is ill posed. What 
gCLEAN does is return a best-fit solution in much the 
same vein as a singular value decomposition. 



A. CLEANing 

The first step in the gCLEAN procedure is to consider 
the inner product of each template U with the source 
vector S, which represents the data stream from the in- 
terferometer. The "best fit" template, tj, is identified as 
the template with the largest overlap with S, and a small 
amount e is subtracted off: 

S' = S - e(S ■ i s )ij , (65) 



where (S ■ ij)ij is gCLEAN's best estimate of S. The 
template tj and amount removed are recorded for later 
reconstruction. 

The procedure is iterated until only a small fraction of 
the original power remains (for the simulations presented 
below, the fraction was chosen to be 1%). It should be 
emphasized that the data stream that remains after this 
process is not the CLEANed data stream. By design 
gCLEAN will remove a pre-set fraction of the original 
power, no matter what the original signal is composed 
of. It is only after reconstructing the sources from the 
gCLEAN record that we can meaningfully attempt to 
remove a source from the data stream. 

The pieces which are subtracted off in the gCLEAN 
procedure are assumed to be portions of individual 
sources Si, the ensemble of which form the total signal 
S ; during reconstruction these pieces are resummed into 
representations of the individual sources. 



B. Reconstruction 

The gCLEAN procedure cannot produce a perfect 
match with a raw data stream from LISA, due to the 
discrete griding of parameter space, the interference be- 
tween the frequency components of different sources and 
instrument noise. These effects serve to generate sub- 
tracted elements which are close, but not identical to 
each other. The task during reconstruction is to iden- 
tify which combination of subtracted elements are close 
enough together that they are considered to be manifes- 
tations of a single source. 

Reconstruction is implemented by finding the bright- 
est element in the list of saved matches produced by 
gCLEAN, and computing the overlap of this element with 
all other saved elements. For a given overlap thresh- 
old, all sources with strong overlap are considered to be 
"close", and are summed together to represent a sin- 
gle source. The procedure is iterated over the remain- 
ing saved elements until every element in the gCLEAN 
record has contributed to a source. The frequency and 
source location parameters for the reconstructed sources 
are taken to be a weighted average of all the matches 
contributing to that source, where the weighting is given 
by the individual match amplitudes. 

The procedure is complicated by the non-local param- 
eter degeneracies discussed in section IV Bl The recon- 
struction may combine contributions that are close in 
terms of template overlap, but far apart in terms of the 
template metric. It makes no sense to average the pa- 
rameters of metrically distant templates. For this reason 
we only use contributions that are metrically close when 
calculating the weighted averages of the source parame- 
ters. This can lead to several different best fit values for 
the reconstructed source. 

We encounter an additional difficulty when trying to 
reconstruct the source amplitudes A and A. Consider a 
source with amplitude A. If gCLEAN performs n sub- 
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tractions from this source the remaining amplitude will 
be 



A r , 



A(l-ej' 



(66) 



The equation is only approximate as other reconstructed 
sources may have added or subtracted from the source in 
question during the course of the gCLEAN procedure. If 
we simply add together the n contributions identified by 
gCLEAN, the amplitude of the reconstructed source will 
equal 



A r « A(l- (1 



(67) 



In other words, gCLEAN will tend to under estimate the 
amplitude of a source. To compensate, we multiply the 
initial reconstruction by a factor of 1/ (1 — (1 — e) n ) to 
arrive at the final reconstruction which gives a better es- 
timate of A. Using this estimate, along with the weighted 
averages for (#, 0, i, -0), we can use Equation^!* calcu- 
late A. Unfortunately, any errors in the determination 
of (A, 8, <j>, i, -0) adversely affect our determination of A. 
Because of this, the intrinsic amplitude of a source, A, is 
usually the worst determined quantity. 

The reconstruction procedure usually produces more 
reconstructed sources than there were sources in the in- 
put data stream. Most of these additional "sources" have 
very small amplitudes, and their existence can be at- 
tributed to detector noise or the formation of a blended 
version of two or more real sources. For this reason, 
we only consider reconstructed sources with an ampli- 
tude that exceeds the noise in the detector. Occasionally 
gCLEAN gets confused and produces two fits to a single 
source that are nearby in parameter space, but not close 
enough to have been identified as one source. We discuss 
some ideas for getting around this problem in Section 



C. Isolated Sources 

Figure ^] shows the result of a gCLEAN run carried 
out for an isolated source. The source parameters are 
listed in table[n] The strain spectral density of the source 
and detector noise are shown in Figure lLH along with the 
composite source built by gCLEAN. The signal to noise 
ratio was 9.75 across the bandwidth of the signal. 

The output from gCLEAN was then fed through the 
reconstruction procedure using an overlap threshold of 
0.7. The source parameters were estimated by taking a 
weighted average of the template parameters used to cre- 
ate the composite source. These estimates are listed in 
table [H] The reconstruction procedure was able to fit all 
of the source parameters very well save for the intrinsic 
amplitude A. The error in A is primarily due to the error 
in the inclination. The large error in A translates into a 
large error in the distance to the source d. The recon- 
structed parameter values for the source can be fed into 
our detector response model, and the resulting strain can 




4.9995 5 5.0005 5.001 

/ (mHz) 

FIG. 14: The solid line is the strain spectral density of the 
source, the dotted line is that of the noise and the dashed line 
indicates the strain spectral density of the composite source 
created by gCLEAN. 



TABLE II: The parameters for the isolated source example. 
The first row lists the input values while the second row list 
the reconstructed values. 



/ (mHz) 


A 


A 


8 


4> 


% 


V> 




5.000281 


0.556 


0.648 


0.79 


2.21 


2.45 


1.62 


0.71 


5.000280 


0.786 


0.646 


0.79 


2.21 


2.11 


1.63 


0.81 



be subtracted from the data stream. The CLEANed data 
stream is shown in Figure El The residual is comparable 
to the noise in the detector. 
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FIG. 15: The solid line is the CLEANed strain spectral den- 
sity and the dotted line is the original detector noise. 

What we have shown is that the gCLEAN procedure 
is able to successfully remove isolated sources from the 
LISA data stream. The procedure works equally well if 
there are one or one million isolated sources. The key 
is that the sources are isolated, i.e. the signals do not 
overlap in Fourier space. When the sources are overlap- 
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TABLE III: The source parameters used to generate the over- 
lapping signals. 



TABLE IV: The reconstructed sources 
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/ (mHz) 
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A 
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# 


/ (mHz) 


SNR 


A 


A 


e 


<P 






fo 




1 


4.999729 


0.646 


0.942 


0.731 


0.67 


3.33 


1.98 


1.10 


1.22 


1 


4.999729 


14.3 


0.514 


0.741 


0.66 


3.32 


2.86 


1.42 


1.84 




2 


4.999910 


0.204 


0.477 


0.343 


2.85 


0.42 


1.79 


0.44 


1.10 


2 


4.999904 


7.7 


0.322 


0.399 


2.87 


0.26 


2.64 


0.26 


2.00 




3 


5.000214 


0.163 


0.543 


0.314 


1.50 


4.37 


1.57 


1.04 


1.49 


3 


5.000216 


8.8 


0.829 


0.457 


1.40 


4.35 


1.57 


1.10 


1.18 




4 


5.000089 


0.061 


0.868 


0.320 


2.63 


5.33 


1.42 


2.88 


1.67 






















5 


5.000336 


0.050 


0.217 


0.177 


2.36 


4.12 


1.98 


0.98 


1.78 



ping they interfere with each other and the CLEANing 
is more difficult. 



D. Overlapping Sources 

To get a feel for how the gCLEAN procedure copes 
with overlapping sources, we considered three sources 
with barycenter frequencies near 5 mHz that are within 
~ 5 frequency bins of each other. The total signal to 
noise in the simulation was equal to SNR = 19.5. Table 
IIIII list the randomly generated source parameters and 
the signal-to-noise ratio for each source. The modula- 
tions described in SectionlTTlcause the measured strains to 
overlap in frequency space. The composite strain spectral 
density produced by the three sources is shown in Figure 
1161 along with the detector noise used in the simulation. 
Also shown is the residual strain after the three recon- 
structed sources have been subtracted from the original 
data stream. 
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FIG. 16: The strain spectral density for the overlapping 
source example. The solid line is the signal, the dashed line is 
the residual strain (i.e. the CLEANed signal), and the dotted 
line is the detector noise. 

The usual procedure was followed: The simulated 
LISA data stream was fed into the gCLEAN algorithm, 
and the output from the gCLEAN run was used to recon- 
struct the sources. An overlap threshold of 0.7 was used 
in the reconstruction. The reconstruction produced five 
reconstructed sources with signal to noise ratios greater 
than one. The reconstructed source parameters are listed 



in Table ILVI The first three reconstructed sources are fair 
reproductions of the input sources. The frequencies and 
sky locations are well determined, but there are larger 
errors in the determination of the inclination, polariza- 
tion angle, and orbital phase. The strain amplitude in 
the detector, A, is fairly well determined for sources 1 
and 2, but poorly determined for source 3. Once again, 
the intrinsic amplitude of each source, A, is the least well 
determined parameter. 

In addition to recovering the three input sources, the 
reconstruction procedure produced two spurious sources. 
The degree to which these sources were used by the 
gCLEAN procedure is measured by the amplitude A r . 
It is clear from the A r values that the first three recon- 
structed sources played a much more significant role in 
the gCLEAN procedure than the two spurious sources. 
In terms of the amount of signal removed during the the 
gCLEAN procedure, the reconstructed sources had signal 
to noise ratios of 12.1, 3.9, 4.3, 1.6, and 1.1 respectively. 
This suggests that we should only consider sources with 
signal to noise ratios above SNR ~ 2 when performing 
the reconstruction. Our hope is that, when wc implement 



some of the improvements described in Section fVIIII the 
CLEANing procedure will produce fewer and weaker spu- 
rious reconstructions. 

The strain spectral densities for the sources and their 
reconstructions are shown in Figure ITTl The reconstruc- 
tion procedure underestimates the amplitudes of sources 
2 and 3. This can be attributed to the power lost to spu- 
rious sources in the gCLEAN procedure. The CLEANed 
strain spectral density shown in Figure [TBI was produced 
by subtracting the three reconstructed sources shown in 
Figure lT7l from the input data stream. The residual strain 
is down by a factor of ~ 10 from the input level, but is 
still considerably larger than the detector noise. The 
goal of future work will be to improve the source iden- 
tification and subtraction procedure to the point where 
multiple sources with overlapping signals can be removed 
from the LISA data stream leaving a residual that is com- 
parable to the instrument noise. 



VIII. FUTURE WORK 

The gCLEAN algorithm described here is only the first 
step in a program to understand how to remove binaries 
from the LISA data stream. In particular, the limitation 
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FIG. 17: The strain spectral densities of the three sources 
(solid lines) and their reconstructions (dashed lines). 



of the simulations presented here are for small numbers 
of binaries, and at frequencies above the expected regime 
where multiple overlapping binaries contribute power in 
every bin of the power spectrum (this occurs at / ~ 3 
mHz, for an assumed bin width of — 1/year). 

A key question is how effectively can gCLEAN identify 
binaries which have merged together to form a confusion 
limited background? While gCLEAN will subtract any 
signal out of the data stream down to a prescribed level 
in total power, using as many templates as necessary to 
remove the "signal" , the real question is how well can it 
identify individual sources for later removal? Information 
theory predicts an ultimate bound on the number of bi- 
naries which can be fit out of the LISA data stream , 
and an important question is how closely can gCLEAN 
approach this optimal limit. 

A great deal of work has yet to be done in the area 
of optimizing the gCLEAN procedure and making it an 
effective tool in the LISA data analysis arsenal; many of 
them are obvious extensions to the initial foray presented 



in this work. Of particular interest is extending gCLEAN 
to work with multiple data streams. The design of the 
LISA observatory provides three different data streams, 
which can be combined in various ways . As currently 
implemented, gCLEAN only uses a single data stream. 
We are in the process of upgrading the algorithm so that 
all data streams are used. It is our hope that some of the 
parameter degeneracies described in Section IV CI will be 
broken when more than one interferometer signal is used. 
At the very least, we expect the parameter estimation to 
be improved. It would also be interesting to see how 
much better the algorithm performs if we use more than 
one year of observations. 

The placement of templates in parameter space is an 
area where improvements in efficiency can be imple- 
mented. Templates now are spaced for convenience (e.g., 
points on the sky are spaced on the HEALPIX centers, 
which are effective for visualization), but efficient tem- 
plate spacing should be developed based on the local val- 
ues of the metric on the template space. 

There are also several unresolved questions about the 
gCLEAN algorithm and the ultimate limits of its perfor- 
mance on real scientific data. Of particular interest is 
how will gCLEAN perform when other signals, such as 
those from supermassive black hole binaries, are present 
in the data? The research presented here has been for the 
case where only circular Newtonian binaries are present 
in the data stream. It is clear from the way gCLEAN is 
designed to work that it will indiscriminately remove sig- 
nals from a data stream; this has important implications 
for how gCLEAN should be included in the approach to 
LISA data analysis. How will gCLEAN deal with chirp- 
ing binaries, or signals from extreme-mass ratio inspirals? 
Can gCLEAN be used in a sequential analysis strategy, 
where it is used to first subtract out monochromatic bi- 
naries before looking for other gravitational wave events, 
or do all signals have to be simultaneously gCLEANed 
using templates for each individual type of source? 

There may be better ways to extract the best fit pa- 
rameter values for the reconstructed sources. Currently 
we use a weighted average of the parameters that describe 
the templates used to build up the reconstructed source. 
A better approach may be to take each reconstructed 
source and use a hierarchical search through parameter 
space to find which set of parameters give the best match 
to the reconstructed source. 

Another avenue of research is devising better strate- 
gies for accurately fitting sources. One idea is to at- 
tempt multi-fitting, where gCLEAN removes more than 
one source at a time. The parameter space scales as 
6", where n is the number of sources which the algo- 
rithm is attempting to subtract out at once. With this in 
mind, it is obvious that one would have to establish initial 
estimates of the parameters from a standard gCLEAN 
pass in order to narrow the search area of the param- 
eter space used in the multi-fitting. An aspect of the 
subtraction enterprise which might benefit from such a 
procedure is what to do with orphaned sources which 
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are generated during reconstruction but do not meet the 
threshold requirements to be included in the final list of 
identified sources. These orphans represent subtractions 
on the part of gCLEAN which arise from either fluctua- 
tions in the detector noise which produces a close match 
with templates, or more commonly, interference between 
the signals of multiple sources which produced a strong 
match during the gCLEAN iterations. This type of sub- 
traction is inevitable, as gCLEAN has no a priori way of 
distinguishing interfering sources from isolated sources; 
it relies only on its ability to match the current version 
of the data stream to its space of templates. 

There are many obvious avenues of refinement which 
should be pursued in future work to develop the gCLEAN 
algorithm. We are working on several of the issues de- 
scribed above, and we encourage others to pursue aspects 
of the problem which are of interest to them. To aid in 
the exploration of the strengths and weaknesses of the 



gCLEAN algorithm, the analysis codes used to produce 
the results in this paper will be made available to the 
scientific community through the Working Groups of the 
LISA project. 
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